Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Recognizing Characters of Ancient Manuscripts

Identifieur interne : 000769 ( Main/Exploration ); précédent : 000768; suivant : 000770

Recognizing Characters of Ancient Manuscripts

Auteurs : Markus Diem [Autriche] ; Robert Sablatnig [Autriche]

Source :

RBID : Pascal:10-0398508

Descripteurs français

English descriptors

Abstract

Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from the 11th century are investigated. In order to minimize the consequences of false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally local information allows the recognition of partially visible or washed out characters. The proposed algorithm consists of two steps: character classification and character localization. Initially Scale Invariant Feature Transform (SIFT) features are extracted which are subsequently classified using Support Vector Machines (SVM). Afterwards, the interest points are clustered according to their spatial information. Thereby, characters are localized and finally recognized based on a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background clutter (e.g. stains, tears) and faded out characters.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Recognizing Characters of Ancient Manuscripts</title>
<author>
<name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Vienna University of Technology Pattern Recognition and Image Processing Group Favoritenstr. 9/183-2</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName>
<region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Vienna University of Technology Pattern Recognition and Image Processing Group Favoritenstr. 9/183-2</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName>
<region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">10-0398508</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 10-0398508 INIST</idno>
<idno type="RBID">Pascal:10-0398508</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000170</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000607</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000143</idno>
<idno type="wicri:doubleKey">0277-786X:2010:Diem M:recognizing:characters:of</idno>
<idno type="wicri:Area/Main/Merge">000774</idno>
<idno type="wicri:Area/Main/Curation">000769</idno>
<idno type="wicri:Area/Main/Exploration">000769</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Recognizing Characters of Ancient Manuscripts</title>
<author>
<name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Vienna University of Technology Pattern Recognition and Image Processing Group Favoritenstr. 9/183-2</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName>
<region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Vienna University of Technology Pattern Recognition and Image Processing Group Favoritenstr. 9/183-2</s1>
<s2>1040 Vienna</s2>
<s3>AUT</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Autriche</country>
<placeName>
<region type="land" nuts="2">Vienne (Autriche)</region>
<settlement type="city">Vienne (Autriche)</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Proceedings of SPIE, the International Society for Optical Engineering</title>
<title level="j" type="abbreviated">Proc. SPIE Int. Soc. Opt. Eng.</title>
<idno type="ISSN">0277-786X</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Image analysis</term>
<term>Imagery</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Analyse image</term>
<term>Imagerie</term>
<term>Algorithme</term>
<term>0130C</term>
<term>4230</term>
<term>Méthode</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Considering printed Latin text, the main issues of Optical Character Recognition (OCR) systems are solved. However, for degraded handwritten document images, basic preprocessing steps such as binarization, gain poor results with state-of-the-art methods. In this paper ancient Slavonic manuscripts from the 11th century are investigated. In order to minimize the consequences of false character segmentation, a binarization-free approach based on local descriptors is proposed. Additionally local information allows the recognition of partially visible or washed out characters. The proposed algorithm consists of two steps: character classification and character localization. Initially Scale Invariant Feature Transform (SIFT) features are extracted which are subsequently classified using Support Vector Machines (SVM). Afterwards, the interest points are clustered according to their spatial information. Thereby, characters are localized and finally recognized based on a weighted voting scheme of pre-classified local descriptors. Preliminary results show that the proposed system can handle highly degraded manuscript images with background clutter (e.g. stains, tears) and faded out characters.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Autriche</li>
</country>
<region>
<li>Vienne (Autriche)</li>
</region>
<settlement>
<li>Vienne (Autriche)</li>
</settlement>
</list>
<tree>
<country name="Autriche">
<region name="Vienne (Autriche)">
<name sortKey="Diem, Markus" sort="Diem, Markus" uniqKey="Diem M" first="Markus" last="Diem">Markus Diem</name>
</region>
<name sortKey="Sablatnig, Robert" sort="Sablatnig, Robert" uniqKey="Sablatnig R" first="Robert" last="Sablatnig">Robert Sablatnig</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000769 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000769 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0398508
   |texte=   Recognizing Characters of Ancient Manuscripts
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024